Repseek, a tool to retrieve approximate from large DNA sequences
نویسنده
چکیده
The importance of genome redundancy has been strongly emphasized in the field of genome dynamics and evolution as well as in medical biology. A repeat is a sequence present twice or more with a high degree of similarity within a larger sequence (e.g. a chromosome) or set of sequences (e.g. a genome with several chromosomes). Each instance of the repeated sub-sequence is called a ’copy’ of the repeat. We use the term ”duplication” to denote any active mechanistic event that creates a repeat. Even though spurious duplication events (or recombination events between repeats) can cause severe disorders [26, 24], repeated elements remain nonetheless a very important driving force of genome evolution [28]. In that respect, the dynamics and the evolution of these redundant sequences have been studied in bacterial genomes [31, 32, 5] as well as in eukaryote genomes [3, 4, 38]. Duplication events can sometimes copy entire coding regions, giving birth to what is often referred as duplicate genes. Those duplicate genes are the raw material leading to the emergence of novel functions and have been extensively studied (for a historical review see [37]). Although the repeats we are interested in encompass a lot of known biological repeated elements (i.e. transposable elements, duplicated genes, DNA-satellites, segmental duplication, etc.) our main concern is not to identify specific families of repeats, but to extract repeats on the sole basis of their sequence similarity and without any prior consideration of their biological function. Unlike RepeatMasker [34], we do not search for already well characterized repeated elements. Furthermore, our primary goal is not to construct families of repeats. This is the objective of dedicated software such as RepeatScout [30] or of clustering algorithms [9, 29], which reconstruct families from pairs of repeats. Of course, our program can be used to feed these clustering algorithms. While there are some widely accepted methods to detect duplicate genes in a genome (for instance based on BLAST or FASTA programs), there is no firmly established technique concerning the detection of repeats in large DNA sequences. The detection of repeats is not a trivial problem and there is no satisfactory methodology available apart from recursive local alignment (using dynamic programming) of sequences with themselves [41]. Such algorithms, however, are quadratic in computation time and in memory usage and
منابع مشابه
Repseek, a tool to retrieve approximate repeats from large DNA sequences
UNLABELLED Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we presen...
متن کاملApproximate resistivity and susceptibility mapping from airborne electromagnetic and magnetic data, a case study for a geologically plausible porphyry copper unit in Iran
This paper describes the application of approximate methods to invert airborne magnetic data as well as helicopter-borne frequency domain electromagnetic data in order to retrieve a joint model of magnetic susceptibility and electrical resistivity. The study area located in Semnan province of Iran consists of an arc-shaped porphyry andesite covered by sedimentary units which may have potential ...
متن کاملMolecular and Bioinformatics Analysis of Allelic Diversity in IGFBP2 Gene Promoter in Indigenous Makuee and Lori-Bakhtiari Sheep Breeds
The aim of this study was to perform molecular and bioinformatics analysis of IGFBP2 gene promoter in association with some economic traits in indigenous Makuee (MS) and Lori-Bakhtiari (LB) breeds. DNA was extracted from blood samples of 120 MS and 200 LB and a 297 bp fragment from the upstream sequences of studied gene was amplified and genotyped by single-strand conformational polymo...
متن کاملAutomatic Collection of Functional Sequence Based on Annotations in the GenBank Entry
It becomes important to develop a software tool that enables researchers to collect necessary DNA sequences from databases on Internet. We have developed a client-server tool that aims to retrieve automatically based on the keyword search, necessary sequences on a client machine by connecting to GenBank database server [1]. Our tool extracts the required sequence based on the annotations, espec...
متن کاملResearch Article: Molecular genetic divergence of five genera of cypriniform fish in Iran assessed by DNA barcoding
The present study represents a comprehensive molecular assessment of some family of freshwater fishes in Iran. We analyzed cytochrome oxidase I (COI) sequences for five genus of cypriniform fishes from Iran. The present investigation provides data on genetic structure of some species of Nemachilidae including Paraschistura bampurensis, Oxynoemacheilus kiabii and Turcinemacheilus saadii and Leuc...
متن کامل